|
Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function. The idea of gradient boosting originated in the observation by Leo Breiman 〔Brieman, L. "(Arcing The Edge )" (June 1997)〕 that boosting can be interpreted as an optimization algorithm on a suitable cost function. Explicit regression gradient boosting algorithms were subsequently developed by Jerome H. Friedman〔Friedman, J. H. "(Greedy Function Approximation: A Gradient Boosting Machine. )" (February 1999)〕〔Friedman, J. H. "(Stochastic Gradient Boosting. )" (March 1999)〕 simultaneously with the more general functional gradient boosting perspective of Llew Mason, Jonathan Baxter, Peter Bartlett and Marcus Frean .〔 〕〔 〕 The latter two papers introduced the abstract view of boosting algorithms as iterative ''functional gradient descent'' algorithms. That is, algorithms that optimize a cost ''functional'' over function space by iteratively choosing a function (weak hypothesis) that points in the negative gradient direction. This functional gradient view of boosting has led to the development of boosting algorithms in many areas of machine learning and statistics beyond regression and classification. == Informal introduction == (This section follows the exposition of gradient boosting by Li.) Like other boosting methods, gradient boosting combines weak learners into a single strong learner, in an iterative fashion. It is easiest to explain in the least-squares regression setting, where the goal is to learn a model that predicts values , minimizing the mean squared error to the true values (averaged over some training set). At each stage of gradient boosting, it may be assumed that there is some imperfect model (at the outset, a very weak model that just predicts the mean in the training set could be used). The gradient boosting algorithm does not change in any way; instead, it improves on it by constructing a new model that adds an estimator to provide a better model . The question is now, how to find ? The gradient boosting solution starts with the observation that a perfect would imply : or, equivalently, :. Therefore, gradient boosting will fit to the ''residual'' . Like in other boosting variants, each learns to correct its predecessor . A generalization of this idea to other loss functions than squared error (and to classification and ranking problems) follows from the observation that residuals are the negative gradients of the squared error loss function . So, gradient boosting is a gradient descent algorithm; and generalizing it entails "plugging in" a different loss and its gradient. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「gradient boosting」の詳細全文を読む スポンサード リンク
|